Introduction
Accurate molecular subtyping of acute leukemia (AL) is imperative for optimal risk stratification and treatment selection. Currently, diagnostic categories in AL are based on morphologic, immunophenotypic, and genetic features. However, these methods do not capture the full biological heterogeneity observed in AL, limiting further refinement of diagnostic and predictive categories. Additionally, current diagnostic approaches require a multitude of tests to be run in parallel, which can be costly, require special expertise, and lead to delays in care.
To address these challenges, we aimed to develop a clinically applicable framework to rapidly classify AL using genome-wide DNA methylation profiling and machine learning. We applied our cross-platform classification tool to a variety of DNA methylation data types, including primary AL samples profiled by our group with Nanopore sequencing, and evaluated its application for rapid sample classification in real-time.
Results
To generate a methylation-based AL reference, we integrated >2,500 samples and controls from 11 public datasets (based on Illumina 450k/EPIC array), including both adult and pediatric cases. We defined 43 methylation classes covering AML (n=21), B-ALL (n=11), T-ALL (n=5), ALAL (n=1), and controls. Methylation-based classification closely matched lineage classification by standard pathology evaluation in most patients. Samples with lineage ambiguous alterations, including BCL11B-activating alterations, ZNF384-r, and PICALM::MLLT10, were associated with distinct methylation classes despite differing immunophenotypes. Within lineages, many methylation classes closely recapitulated genetic categories, such as PML::RARA and CEBPA-mut AML, or ETV6::RUNX1 and DUX4-r B-ALL. For other AL subtypes, methylation profiling revealed disease heterogeneity beyond that captured by standard genetic categories. For example, we identified nine distinct HOX-activated methylation classes in AML, which cut across canonical driver alterations (e.g., KMT2A-r, NPM1-mut, NUP98-r), and other emerging genetic subtypes (e.g., UBTF-ITD).
Based on this reference, we developed a general model for epigenomic classification of incoming AL samples. We trained a deep neural network that can be applied to a variety of data types, including sparse Nanopore sequencing data. Our model achieved a median F1 score of 0.91 in cross-validation and showed good performance with only 3% of the input data (r=0.93). To highlight its general applicability, we successfully applied our model to additional external datasets (27k/450k methylation array, WGBS, and Nanopore sequencing).
To clinically evaluate our approach, we performed Nanopore methylation profiling on 19 retrospective AL samples at our institution. At an average depth of 3-fold genomic coverage, our classifier generated confident methylation class predictions in 18 cases (94.7%). Of these, we show concordance between conventional diagnosis and class prediction in 15 cases. For the remaining three cases, two were reclassified according to the methylation class after case review (cryptic DUX4-r, CEBPA-mut).
As a proof-of-concept for rapid classification, we sequenced two prospective samples from patients suspected of AL and generated predictions in real-time. In both cases, a confident classification was made within one hour of sequencing time and less than two hours after receiving the sample. Conventional diagnosis, which was obtained several days later, confirmed our rapid epigenomic classification.
Conclusion
We present a novel machine learning framework that utilizes genome-wide DNA methylation profiling to accurately classify AL. Our approach offers a number of key advantages: Single-assay epigenetic profiling effectively resolves the biological heterogeneity of AL and is a valid surrogate for many conventional diagnostic assays. When coupling our multi-platform classifier to emerging Nanopore sequencing, clinically actionable results can be generated within hours. In addition, Nanopore sequencing is affordable and easy to implement, making it suitable for high-tech laboratories but also in remote settings. We believe that our framework has great potential for application in clinical routines and in basic research, and provides a foundation for future developments in machine learning-assisted AL diagnostics.
Schliemann:Jazz Pharmaceuticals: Honoraria, Other: Travel- & congress-support, Research Funding; Novartis: Honoraria, Membership on an entity's Board of Directors or advisory committees; Laboratories Delbert: Honoraria, Membership on an entity's Board of Directors or advisory committees; Abbvie: Honoraria, Membership on an entity's Board of Directors or advisory committees, Other: Travel- & congress-support; AstraZeneca: Honoraria; BMS: Honoraria, Membership on an entity's Board of Directors or advisory committees, Other: Travel- & congress-support; Servier: Honoraria, Membership on an entity's Board of Directors or advisory committees; Astellas: Honoraria, Membership on an entity's Board of Directors or advisory committees; Anturec Pharmaceuticals: Research Funding; Roche: Honoraria, Membership on an entity's Board of Directors or advisory committees; Pfizer: Honoraria, Other: Travel- & congress-support. Chen:AbbVie: Consultancy; Rigel: Consultancy.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal